Skip to content

Added performance analysis as a feature with AutoModelForCausalLM#888

Open
ochougul wants to merge 3 commits intomainfrom
get_perf
Open

Added performance analysis as a feature with AutoModelForCausalLM#888
ochougul wants to merge 3 commits intomainfrom
get_perf

Conversation

@ochougul
Copy link
Contributor

Summary

  • Added evaluate_performance(...) to QEFFAutoModelForCausalLM for end-to-end performance analysis: compile + qaic-runner + qaic-opstats.
  • Compile perf flags are always enabled: aic_perf_metrics=True, aic_perf_warning=True; for raw_device_stats, also force stats_level=70, ddr_stats=True, aic_pmu_recipe="KernelUtil".
  • Added prefill_only to evaluate_performance(...) and now forward it to compile(...).

Key Behavior Changes

Stage selection is now:

  • prefill_only=True -> prefill-only
  • prefill_seq_len==1 -> decode-only
  • otherwise -> both prefill + decode

Artifacts and Paths

  • Standardized output layout: compile/, io/, performance_analysis/.
  • Added per-stage subdirs for prefill/decode under io, profiling, runner_outputs, and opstats.

Validation

  • Expanded tests in tests/unit_test/utils/test_auto_model_api.py; status: 70 passed.
  • Hardware smoke verified both:
  • --prefill-only -> only prefill artifacts
  • --prompt-len 1 (without --prefill-only) -> only decode artifacts

…ausalLM class

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
@ochougul ochougul self-assigned this Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant